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(54) Data processing system with digital signal processor core and co-processor 



(57} A data processing system Includes a digital sig- 
nal processor core (110) and a co-processor (i 40), The 
co-processor (140) has a local memory {141, 145 : 147) 
w\Mn the address space of fhs satd digital signal proc- 
essor core (110). The coprocessor (140) responds 
commands from Ihs digital signal processor coro (HO), 
A direct memory access circuit (120) autonomously 
transfers date to and from I he teas rn&rnory (141 ,145, 
147} of the co processor (140), Co processor com- 
mands are stored m a command FIFO memory ("Mi) 



mapped to a predetermined memory address. Control 
commands, includes a ?ece!ve data synchronism com- 
mand falling the coprocessor (140) unf it completion of 
a memory transfer into the local memory {141, 145 : 
147). A send data synchronism command causes the 
co-processor (140) to signal the direct memory access 
circuit (120) to trigger memory transfer out of the focal 
memory (141 . 14S, 147), An interrupt command causes 
the co-processor (140) to interrupt \m digital signal 
processor cos ® (110), 
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Description 

TECHNICAL FIELD OF THE SNVEMTION 

[0001] The present invention relates generally to the 
fields of digital signal processing, and more particularly 
to a digital signal processor with a core data processor 
and a ^configurable co-processor. 

BACKGROUND OF THE INVENTION 

[0002] Digital signal processing is becoming more 
and more common: tor audio and video processing. In 
many instances a single digital processor can replace a 
host of prior discrete analog components. The increase 
in processing capacity afforded by digital signal proces- 
sors had enabled more types of devices and more func- 
tions for prior devices. This process has created the ap - 
petite for mom complex functions and features on cur- 
rent devices and new types of devices. In some cases 
this appetite has outstripped the ability to cost effectively 
deliver the desired functbnality with M\ programmable 
digital signal processors 

[0003] One response to this need is to couple a digital 
signal processor with an application specific integrated 
circuit (ASIC). The digital signal processor Is pro- 
grammed to handle control functions and some signal 
processing The full programmable of the digital signal 
processor enables product differentiation through differ- 
ent programming. The ASIC Is constructed to provide 
processing hardware for certain core functions that are 
commonly performed and time critical With the Increas- 
ing: density of integrated circuits it is now becoming pos- 
sible to place a di glial signal processor and an ASIC 
hardware co-processor on the same chip, 
[O0O4| This approach has two problems. This ap- 
proach rarely resets in an efficient connection between 
the hardware co-processor ASIC and the distal signal 
processor, it is typical 10 handle most of the interface by 
programming the digital signal processor, in many cas- 
es me digital signal processor must supply data pointers 
and commands in real time as the hardware co-proces- 
sor is operating.. To form safe designs, It Is typical to pro- 
vide extra time tor the digital signal processor to service 
the hardware co-processor. This means that the hard- 
ware co-processor is not fully used. A second problem 
comes from the time to design problem. Wrth the In- 
creasing capability to design differing functionality, the 
product cycles have been reduced. This puts a premium 
on designing new functions quiekfy\ The ability to reuse 
programs and interlaces would aid in shortening design 
cycles. However, the fixed functions implemented in the 
ASIC hardware co-processor cannot easily be reused. 
The typical ABIC hardware co-processor has a limrted 
set of functions suitable for a narrow range of problems. 
These designs cannot be quickly reused even to imple- 
ment closely related functions, in addition the interlace 
between the digital: signal processor and the ASIC hard- 



ware co-processor tends to use ad hoc techniques that 
are specific to a particuiar product. 

SUMMARY OF THE INVENTION 

[0G05| This invention is a data processing system In- 
cluding a drgitai signal processor core and a co-proces- 
sor The co-processor has a local memory within the ad- 
dress space of the said digital signal processor core. 
w The coprocessor is responsive to commands trom the 
digital signal processor core to perform predetermined 
data processing operations on data stored In said toeal 
memory in parallel with digital signal processor com. 
The data processing system incudes a direct memory 
access circuit under the control of the digital signal proe^ 
assor core. The direct memory access circuit autono- 
mously transfers data to and from Ihe local memory of 
the co processor 

{Q0QB} The co processor responds to commands to 

2$ configure itsefl correspond Ingiy to perform a set of re- 
lated data processing operation. Co- processor com- 
mands are stored In a command first in first out memory. 
The command FIFO memory has an mapped \o a pre- 
determined memory address 
[0087j The co-processor is respon sive to va rbtis con- 
trol commands, A receive data synchronism command 
pauses processing commands until the direct memory 
access circuit signals completion of a memory transfer 
into the local memory, A send data synchronism com- 

& mand causes the co-processor to signal the direct mem- 
ory access circuit to trigger a predetermined memory 
transfer out of the local memory. An interrupt command 
causes the co-processor to interrupt ihe digital signal 
processor core. 

s$ [OQO&j Each command includes an indication of a da- 
ta Input location within the focal memory. The co-proc- 
essor recalls data from beat memory starting with the 
indicated data input location. Each command includes 
an indication of a data output location within the local 

40 memory, The co-processor stores resultant data tocai 
memory starting with the indicated data input location. 
The input data may be stored in a circularly organized 
memory area serving as an input buffer The resultant 
data may be stored In a circularly organized memory ar- 
m serving as an output buffer. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0009] The present invention will now be further de- 
SO scribed by way of example, with reference to the exem- 
plary embodiments illustrated in the accompanying 
drawings, in which: 

Figure 1 illustrates the combination of a digital sig- 
nal processor core and a recontigtirable hardware 
co-processor in accordance with this invention; 
Figure 2 illustrates the memory map logical cou- 
pling between the digital signal processor core and 
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the reoontjgur&ble hardware co-processor of this in- 
vention:; 

Figure 3 iSiu&rHtes a manner of using the reconfig* 
urabie hardware co-processor memory; 
Figure 4 illustrates a memory management tech- 
niques useful for filter algorithms; 
FiguraS illustrates an alternative embodiment of the 
combination of Figure 1 including two co-proces- 
sors with a private bus between. 
Figure 8 illustrates the construction of a hardware 
co-processor which is f ©configurable to perform a 
variety of fitter functions; 

Figure 7 illustrates the input formatter of the ^con- 
figurable hardware coprocessor illustrated in Fig- 
ure 6; 

Figure 8 illustrates the re-configurable data path 
core of the reconfigurafels hardware co-processor 
illustrated in Figure 6: 

Figure 9 illustrates the output formatter of the recorv 
figurabte hardware co-processor illustrated In Fig- 
ure 8: 

Figure 10 illustrates the data flow connections 
through the data path core for performing a real fi- 
nite imputes response filter: 
Figure 11 illustrates the data How connections 
through the data path core tor performing a complex 
finite impute response ft iter; 
Figure 12 illustrates the dala flow connection 
through the data path core for performing a coeffi- 
cient update function; and 
Figure 13 illustrates the data flow connections 
through the eiafa path core for performing fast Fou- 
rier transform:, 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 

[001(5] Figure 1 illustrates circuit 100 irtciuchng a dig- 
Ital signal processor core 1 1 0 and a reconilgurabie hard- 
ware co-processor 140, In accordance with the pre- 
ferred embodiment of this invert ion ; these pans are 
formed in a single integrated circuit. Digital signal proc- 
essor core 110 may be of conventional design. In the 
preferred embodiment d:gitai signal processor core 110 
is adapted to conlroi direct memory access circus! 120 
for autonomous data transfers independent of digital 
signal processor core 110. External memory interface 
130 serves to interface the internal data bus 101 and 
address bus 103 to their externa} counterparts external 
da-a bus t 31 and external address bus 133. respective- 
ly, External memory interface 1 30 is conventional in con- 
struction. Integrated circuit 100 may optionally include 
additional conventional features and circuits. Note par- 
ticularly that the addition of cache memory to integrated 
circuit 100 could substantially Improve performance. 
The pans illustrated in Figure 1 are no! Intended to ex* 
etude the provision of other conventional pasts. Those 
conventional parts Illustrated in figure i are merely the 



pahs moss effected by the addition of reeontlgurable 
hardware co-processor 140. 

[001 1 J Reconfigu r&bi e harctos re co-proc essor i 40 is 
coupled toother parts of integrated circuit 100 via data 

5 bus 1 01 and address bus 103. Reconfigurable hardware 
co- processor 140 includes command memory 141, co- 
processor logtc core 1 43. data memory 1 46 and coeffi- 
cient memory 147. Command memory 141 serves as 
the condutf by which digital signal processor core 110 

to controls the operates of reecnftgurable hardware co- 
processor 140, This feature will be further Illustrated in 
Figure 2. Co-processor taojc core 143 is responsive to 
commands stored In command memory 1 41 to perform 
co-processing -functions. These co-processing tunc- 

« tions Invoke exchange of data between co-processor 
logic core 143 and data memory 145 and coefficient 
memory 147. Data memory 145 stores the input data 
processed by reconfigurabse hardware co-processor 

1 40 and further storos the resuitant of the operations of 
2$ reconffgurable narrate co-processor 140. The man- 
ner of storing this data will be further described below 
with respect to Figaro 2, Coefficient memory 147 stores 
the unchanging or relatively unchanging process pa- 
rameters called coefficients used by co-processor logic 

6 core 143. Though data memory 145 and coefficient 
memory 147 have been illustrated as separate pans, it 
would be easy to employ these merely as different poi- 
sons of a single, unified memory As will he shown be- 
low, for the multiple multiply accumulate co processor 

30 described below it is best if such a single unified memory 
have two read pons for data and coefficients and two 
write ports for writing the output data, it is believed best 
that the memory accessible by reconfigurabte hardware 
co-processor 1 40 be located on the same integrated clr- 
cuit In physical proximity to co-processor foglc core 1 43, 
This physical c been ess is needed to accommodate- foe 
wide memory buses required by the desired data 
throughput of co-processor fcpje core 1 43. 
[0012] Figure 2 illustrates She memory mapped inter- 

40 face between digifal s^nai processor core 110 and 
reccntigurable hardware co-processor 140. Digital sig- 
nal processor core 110 controls feconiigu table bard- 
ware co-processor 140 via command memory 141, in 
the preferred embodiment command memory 141 is a 

*s first- in-first-out (FIFO) memory The write port of com- 
mand memory 141 Is memory mapped into a single 
memory location within the address space of digital sig- 
nal processor core 11 0. Thus digital signal processor 
core 110 controls recortfigumble hardware oo-proces- 

£0 sor 1 40 by writing commands to the address serving as 
the input to command memory 141 . Command memory 

141 preferably includes two circular fy oriented pointer. 
The write pointer 131 points fo the location wHhin com- 
mand memory 141 where fhe next received command 
is to be stored. Each time there & a write fo the prede- 
termined address of command memory 14 1 , wrtte point- 
er selects the physical location receiving the data. Fol- 
lowing such a data write, write pointer 151 is updated to 
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point to the next physical tocation within command 
memory 141 Write pointer 151 is circularly oriented in 
thai it wraps around from the iast physical location to the 
first physical location. Reconfigur&bls hardware co- 
processor 1 40 reads commands from command mem- 
ory 141 i:n the same order as they are received (RFC) 
using read pointer 153.. Read pointer 153 points to the 
physical location within command memory 141 storing 
the next command to be read. Read pointer 1 S3 ss up- 
dated to reference the next physical location wShin com- 
mand memory 141 following each such read. Mote that 
read pointer 153 is also circularly oriented and wraps 
around f rom the las! physical location io the first physical 
location. Command memory 141 includes a feature pre- 
venting write pointer 151 from passing read pointer 153. 
This may lake place, for example, by refusing to write 
and sending a memory fauSf signal back to digital signal 
processor core 110 when write pointer 151 and read 
pointer 153 reference the same physical location. Thus 
the FiFC buffer of command memory 141 can be full 
and not accept additional commands, 
[0013] Data memory 145 and coefficient memory 1 47 
are both mapped mlhm the data address space of digital 
signal processor core 110. As illustrated in Figure 2 : 6a- 
ta bus 101 is bidirectional^ coupled to memory 149. in 
accordance with the alternative embodiment noted 
above, both data memory 145 and coefficient memory 
147 are formed as a pari of memory 14? Memory 147 
Is also accessible by co-processor logic core 143 (not 
Illustrated in Figure 2). Figure 2 illustrates three circum- 
scribed areas of memory within memory 149.. As wi ii be 
further described below, reconfigurabie hardware co- 
processor 140 preferably performs several functions 
employing differing memory areas, Note that due to the 
[0014] Integrated circuit 1 00 operates as follows. Dig- 
ital signal processor core 110 controls the data, and co- 
efficients used by recortf durable hardware co-proces- 
sor 1 40 by loading the data into data memory 145 and 
Ihe coefficients fnto coefficient memory 147. Alternative- 
ly, digital signal processor core 110 bads the data and 
coefficients into the unified memory i 49. Digital signal 
processor core 110 may be programmed to perform this 
data transfer directly. Digital signal processor core 11 0 
may alternatively be programmed to control direct mem- 
ory access circuit 1 20 to perform this data transfer Par- 
ticularly for audio or video processing applications, the 
data stream is received at a predictable rale and from a 
predictable input device. Thus It would typically be effi- 
cient for digital signal processor core 110 to control di- 
rect memory access circuit 120' to make transfers from 
external memory to memory accessible by reconfigure 
bie hardware coprocessor 140.. 
£{H}1S] Following the transfer of data to be pEoeessed, 
digital signal processor core 110 signals reconfigurable 
hardware co-processor 140 with the command for the 
desired signal processing algorithm. As previously slat- 
ed, commands are sent to reconfigurabte hardware co- 
processor 140 by a memory write to a predetermined 



address. Received commands are stored in command 
memory 141 on a fir&t-io-fitst-oiJt basis. 
[0016] Each computational command of reconfigura- 
ble hardware co-processor 140 preferably includes a 

5 manner to specify the particular function to be per- 
formed In the preferred embodiment, reconfigurable 
hardware co-processor 140 is constructed to be recorv 
figurabla. Reeonfigurablo hardware co-processor 140 
has a set of functional units, such as multipliers and 

w adders, that can be connected together in dsffanng ways 
to perform different but related ? unctions. The set of re 
fated functions selected 'for each reconfigurable hard- 
ware co-processor wift be bas-ed upon a similarity of the 
mathematics of the functions. This similarly in mathe- 
matfce enable© similar hardware to be reconfigured for 
the plural functions. The command may indicate the par- 
ficuiar computation via an opcode in the manner of data 
processor instructions. 

[0017] Each computational command includes a 
£S manner of specifying the location of she data to be used 
by the computation. There are many suitable methods 
of designating the data space. For example, the com- 
mand may specify a staffing address and number of da- 
ta words or samples within the block. The data size may 
be specified as a parameter or \\ may be specrfted by 
the opcode defining the computation type. As a further 
example, the command may specify the data $i2e : the 
smarting address and the ending address of the input da- 
fa. Note that known indirect methods of specifying 
30 where the input data is stored may be used. The com- 
mand may include a pointer to a register or a memory 
location storing any of these parameters such as start 
address, data size, number of samples within the data 
block and end address 

[001 0j Each computational command must further in- 
dicate the memory address range storing the data for 
the particular command. This indication may be made 
by any of the methods listed above with regard to the 
beat Ions storing the input data. In many cases the com- 
4$ putational function will be a filter function and the 
amount of output data following processing wilt be about 
equivalent to the amount of input data. In other cases ; 
the amount of output data may be more or less than the 
amount of input data. In any event, the amount of result- 
's ant data Is known from the amount of input data and the 
type of computational function requested. Thus merely 
specifying the starting address provides sufficient infor- 
mation to indicate where all the resultant data is to be 
stored, it is feasible to store the output data in a destruc- 
SO five manner over-writing input data during processing. 
Alternatively, the output data may be written to a differ- 
ent portion of memory and the input data preserved at 
feast temporary. The selection between these alterna- 
tives may depend upon whether the input data w\\\ be 
re used. 

[OOli] Figure 3 iiicstrates one useful technique in- 
volving alternatively employing two memory areas. One 
memory area 1 44 stores the input data needed for the 
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co-processor function, The relatively constant eoeii- 
dents are stored in coefficient memory 147. This dais, 
is recalled tor use by co-processor sogic cors 143 (1 
read).. The output data is written Into the second memory 
area 146 (1 write). Following use of the data memory 
area 144 direct memory access circuit 120 writes me 
data for the next block overwriting the data previously 
used (2 writ©). At the same time, direct rnomoiy access 
circuit 1 20 reads c&ia from memory area 146 ahead of 
it being overwritten by reeortfigurabie hardware co-proc- 
essor 140 (2 read). These two memory areas tor input 
daia and for resultant date could be configured as cir- 
cular buffers, in a product that requires plural related 
functions separate memory areas delbed as circular 
buffers can be employed. One memory area configured 
m a circular buffer ml\ be allocated to each separata 
function 

[0020] The format of computational commands pre! 
orably closely resembles the format: of a subroutine call 
Insfruolton in a high level language. That is r me com- 
mand includes a command name similar in junction to 
the subroutine name specifying the particular computa- 
tional function to be performed. Each command also In- 
cludes a set of parameters specifying options available 
within Jfte command type These parameters may lake 
the form of direct quant sties or variables, which are point- 
ers to registers or memory locations storing the desired 
quantities. The number and type of 1her>e parameters 
depend upon the command type. This subroutine call 
format is important In reusing programs written for digital 
signal processor core 110, Upon use the programmer 
or the compiler provides a stub subroutine to activate 
recontlguraaie hardware co-processor 140. This stub 
subroutine merely receives me subroutine parameters 
and forms tb a corresponding co -processor command 
using these parameters. The stub subroutine then 
writes this command to the predetermined memory ad - 
dress reserved for command transfer to ^configurable 
hardware co-processor 140 and inert returns. This in- 
vention envisions that the computations! capacity of dig- 
ital signal processor cores will increase regularly with 
time. Thus the processing requirements of a particular 
product may require the combination cf digital signal 
processor core 110 and reconfigurable hardware co- 
processor 140 at one point in time. At a later point in 
time,, the available computational capacity of an instruc- 
tion set compatible digital signal processor core may in- 
crease so that foe functions previous requiring a recom 
fip.ura.bis hardware co-processor may be performed in 
software by me digital signal processor core. The prior 
program code for the product may be easily converted 
to the new, more powerful digital signal processor. This 
is achieved by providing independent subroutines for 
each of the commands supported by the replaced 
recent igu rable ha rdwa re eo-proces s or. The n each 
place where the original program employs the subrou- 
tine stub to transmit a command to the reconfigurable 
hardware co-processor is replaced by the correspond- 



ing subroutine calf. Extensive reprogramming is thus 
avoided. 

£{3021] Following completion of processing on one 
block of data, the data may be transferred out of data 

5 memory 1 45 or unified memory 1 49, This second trans- 
fer can take place either by direct action of dlgi-ai signal 
processor core 1 1 0 reading the data stored at the output 
memory locations or through the aid of direct memory 
access circuit 120. This ouiput date may represent the 

to output of the process. In this event, the data is trans- 
ferred to a utilization device Alternatively, the output da- 
ta of reconf'igurable hardware co-processor 140 may 
represent work in progress.- In this case, the data will 
typically be temporarily stored In memory external to in- 

^ tegrated circuit 100 for later retrieval and further 
processing. 

[0022] Recon f ig u rabi e hardware co- processor 1 40 ts 
then ready for further use. This further use may be ad- 
ditional processing of the same function. In this case. 
2$ the process described above Is repeated on a new block 
of data in the same way This further use may be 
processing of another function. In this case, ihe new da- 
fa must be loaded Into memory accessible by ^config- 
urable hardware co-processor 140. the new command 

6 baded and then Ihe processed date read tor outpui or 
further processing, 

[0023] Reconfigurable hardware co-processor 140 
preferably will be able to perform more \hm one function 
ot the product algorithm. Many digital signal processing 

& tasks will use plural instances of similar functions. For 
example, the process may include many similar filter 
functions. Reconfigurable hardware co-processor 140 
preferably has sufficient processing capability 10 per- 
■form all these filler functions in real time. The advantage 

#5 of op eraf i ng on bl oc ks of data rather than d iscrete sam- 
ples will be evident when reconfigurable hardware co- 
processor^ operates in such a system. As an exam- 
ple, suppose that recc-nligurable hardware coproces- 
sor 140 performs three functions. A.. 8 and C These 

4v functions may be sequential or they may be Interleaved 
with functions performed by digital signal processor core 
110. Reconfigurabfe hardware coprocessor 140 first 
performs function A on a block ot data. This function is 
performed as outlined above. Digital signal processor 

*s core 110 aimer directly or by control ot direct memory 
access circuit 120 loads the data into memory area 1 55 
of memory 149, Upon Issue of the command for config- 
uration for function A which specifies the amount ot data 
to be processes, reconfigurable hardware co-processor 

SO 140 performs function A and stores the -resultant data 
In the a portion of memory area 155 specified by the 
command. A similar process occurs to cause recently 
unable hardware co- processor 1 40 so perform function 
B on data stored in memory area 157 and return the t&- 
suit to memory area 157. The performance of Junction 
may fake place upon data blocks having a size unrelated 
to the size of the data blocks for function B. Finally, 
re configurable hardware co-processor 140 is com- 
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mancfed to perform (unction G on data within memory 
area 159 : miming the resultant to memory area 159 
The btek size for performing function C is independent 
of the block sizes sealed tor functions A and 8 
[0024] The usefulness of the block processing is seen 
from this example The three functions A, B and 0 will 
typscally have data flow rates that are independent, that 
Is not necessarily equal. Provision of speda! hardware 
for each function wilt sacrifice the generality oi function- 
ality and reusability of reconfigure hardware. Further, 
it would be difficult to match the resources granted to 
each function In hardware to provide a balance and the 
best utilization of the hardware. When reoontigurable 
hardware is used there is Inevitably en overhead cost 
for switching between configurations. Operating on a 
sample by sample basis for flow through the three func- 
tions would require a maximum number of such recon- 
figuration switches. This would dearly be less than op - 
timal Thus operating each function on a block of data 
before reconfiguration to switch between functions 
would reduce this overhead. Additionally it would then 
be relatively easy to a? locate resources between the 
functions by selecting the amount of time devoted lo 
each function. Lastly: such block processing would gen- 
erally require less control overhead from the digital sig- 
nal processor core than switching between functions at 
a sample level. 

[0025] The bfock sizes selected for the various func- 
tions A, B and C will depend upon the relative data rates 
required and the data sizes, in addition, the tasks as- 
signed to digstal signal processor core 110 and their re- 
spective computational requirements must also be con- 
sidered. Ideally, both digits! signal processor core 110 
and reconffgurable hardware coprocessor 1 40 would 
be nearly fully loaded. This would result in optimum use 
of the resources. Such balanced loading o! digital signal 
processor core 110 and reccnfigurabte hardware co- 
processor 1 40 may only be achieved with product algo- 
rithms 1ha« can use reconf Sparable hardware co-proces- 
sor 140 about so% of the computations. For the case in 
which reccrttigurabfe hardware co-processor 140 can 
perform more that half of the minimum required compu- 
tation^ the additional features implemented on digital 
signal processor core 110 can be added to the product 
to match the loading, This would result in use of spare 
computational resources indicia! signal processor core 
110. The loading of computational processes may be 
statically determined. Such static computational alloca- 
tion can best be made when both digital signal proces- 
sor core 110 and reeonfigurabte hardware co-processor 
140 perform fixed and Known functions, the computa- 
tional load is expected to change with time, then it will 
probably be best io dynamically allocate computational 
resources between dlgifal signal processor core 110 
and recent: gu rah ie hardware co-processor 140 are run 
time, h is anticipated that the processes performed by 
^configurable hardware co-processor 140 will remain 
relatively stable and only the processes performed by 



digital signal processor core 110 would vary 
£0026] Figure 4 shows a memory management tech- 
nique that enables better interruption of operations. Da- 
la 400 consisting of data blocks 401 : 402 and AOS pass - 

0 es the window 410 of a finite impulse tiller. Such filters 
operate on a time history of data. Three processes A, B 
and C operate in respective circular butlers 421. 431 
and 441 within data memory 145, Such a circular buffer 
enables the history to be preserved. Thus when 

w processing the next block following other processing, 
the history data is available at predictable addresses lor 
use. This history dafa is jus* before the newly written 
data for the next block, 

{$027} This technique works well except if memory 

« space needs to be cleared to permit another tasK. in that 
eveni : , the history data could be flushed and reloaded 
upon resumption of the filler processing Alternatively 
the history data needed tor the next block could be 
moved to another area of memory 1 45 or to an external 

£S memory attached to external memory Interface 1 30 Ei- 
ther of these methods is disadvantageous because they 
require time io move data This either delays serving 
the interrupt or resuming the original task. 
[00SB] A preferred alternative & illustrated schemati- 
caily in Figure 4 During the writing o! the resultant data 
lo its place in memory, the current sample is written to 
a smaller area of memory. For example, input data from 
circular buffer 421 is written into history buffer 423, inpus 
data from circular duller 431 is written into history butter 

& 433, and Input data from circular butter 441 is written 
Into history buffer 443. Each of the history buffers 423. 
433 and 443 are just the size needed to store the history 
according to the width of the corresponding filter window 
such as filter window 410. Upon completion of process* 

35 ing of a block of data, the most recent history is stored 
in this restricted area If the co-processor must be inter- 
rupted the data within the circular buffers 421 . 431 and 
441 may be cleared without erasing the history data 
stored In history buffers 423, 433 and 443, This tech- 

40 mque spares the need for reloading the data or storing 
the data else where prior to beginning the interrupt task. 
In many filter tasks, enough writs memory bandwidth will 
be available to achieve writing to the history butlers with- 
out requiring extra cycles. Another advantage of this 

#5 technique is that Ims memory need b® allocated to cir- 
cular buffers 421, 431 and 441 than previously. In the 
previous technique, the circular buffers must be large 
enough to Include an entire block of data and an addi- 
tional amount equal to the required history data. The 

&Q technique illustrated in Figure 4 enables the size of the 
circular buffers 421, 431 and 441 to be reduced to lust 
enough to store one block of data. 
{002%} Many algorithms usef ul in audio and Video sig- 
nal processing Involve adapting coefficients. That is. 
there is some feedback path that changes the function 
performed over time. An example of such a algorithms 
Is a modem that requires a time to a^clapt to the particular 
line employed and the operation of the far end modem. 



6 



11 



EP 0 045 788 A3 



12 



initially it would seem that performing such adaptive 
f unctions in block mode would advemeiy effect the con- 
vergence of ihese adaptive functions. Review of Ihe 
mathematics Involved in many such functions shows 
otherwise The amount of adaption thai can be per- 
formed a* a particular lime generally depends upon the 
amount of' data available for computing the adaption. 
This amount of available data does no! depend upon 
whether the da?a is processed sample by sample or m 
blocks of samples, En practice the rate of adaption will 
be about ihe same. Adaption on a sample by sample 
basis would rssaft In convergence toward the fully 
adapted coefficients m many small steps. Adaption 
based upon blocks of data would result in convergence 
in fewer and larger steps. This is because the greater 
amount of data ava.;tab§e would; drive a larger error term 
for correction in Ihe block processing case However, 
the average convergence slope would be the same for 
the two casas. In cases where most of the adaption 
lakes place upon initialization and most of the process- 
ing takes place under steady state conditions, such as 
the previous modem example, there would be little prac 
tieaf difference. In cases where the adaptive filter must 
follow a moving target, It is not clear whether adaption 
on a sample by sample basis is better man adaption of 
a block basis. if, for exam pie, the process followed var- 
ies at a frequency greater than the inverse of the time 
of the block, size, then adaption on a block basis may 
prevent useless burning m small steps as compared with 
sample by sample adaption. Thus adaptive filtering on 
a block basis has no general disadvantage over adap- 
tive filtering on a sample by sample basis, 
{0030} The command set of reconfigure hardware 
co -processor 140 preferably Includes several non-com- 
putat ional instructions for control functions. These con- 
trol functions w& be useful in cooperation between dig- 
ital signal processor core 110 and reconfrgurable hard- 
ware co-processor 140, The first ot these nor>compu- 
tational commands is a receive data synchronisation 
command, This command will typically be used in con- 
junction with data transfers handed by direct memory 
access circuit 120. Dlgiial signal processor core 110 will 
control the process by setting up ins input data transfer 
through direct memory access c toil 1 20 Digital signal 
processors core 1 10 will send two commands to recorv 
figurabie hardware co-processor 140, The first com- 
mand is the receive data synchronization command. 
The second command is the computational command 
desired. 

{ 0031 } Rccon fig arable hardware coprocessor 140 
operates on commands stored in command memory 
141 on a f!&t4rtflrtt"OUt oasis. Upon reaching the re- 
ceive data synchronization command record igura.ble 
hardware co-processor will stop. Reconflgurabls hard- 
ware co-processor will remain idle until it receives aeon- 
troi signal from direct memory access circuit 1 20 mti\~ 
eating completion of the input data transfer Note that 
upon such completion of this input data transfer, the data 



for the nex; block is stored in data memory 1 45 or unified 
memory 149. Direct memory access circuit 120 may be 
able to handle plural queued data transfers. This is 
known in She art as plural DrMA channels. In ibis case 

5 the receive data synchronization command must note 
the corresponding DMA channel, which would be known 
to digital signal processor core 110 before transmission 
of the receive data synchronization command. Direct 
memory access circuit 1.20 would transmit the channel 

to number of each completed data transfer This would 
permit feconfigurabie hardware coprocessor 140 to 
match the completed direct, memory access with the cor- 
responding receive data synchronization command. 
RecorctiguraMe hardware co-processor would continue 

« to the next command only if a completed direct memory 
access signal indicated the same DMA channel as spec- 
ified in the receive data synchronization command 
[0032] Following this completion signal recontigura- 
ble hardware co-processor 140 advances to she nmi 

2$ command h command memory 141. In mis case this 
next command is a computational command using the 
data just loaded. Since this computational command 
cannot star' until She previous receive date synchroni- 
zation command completes : this assures that the cor- 

6 rect data has been loaded. 

[00333 This combination of the receive data synchro- 
nization command and the computational command re- 
duces the control burden on digital signal processor 
core 110. Digital signal processor core 110 need only 

3C sol up direct memory access circuit 1 20 to make the in- 
put dala transfer and send the pair of commands to 
reconfigurabie hardware coprocessor 140 This would 
assure that the input data transfer had completed prior 
to beginning ihe computational opera! ion. This o/eally 

35 reduces the- amount of software overhead required by 
the digital signal processor core 110 to control the func- 
tion of reconflgurabfc hardware co-processor 140. Oth- 
erwise, digital signal processor core may need to re- 
ceive an interrupt from direct memory access circuit 1 20 

40 signaling the completion of the input data load opera- 
tion. An interrupt service routine must be written to serv- 
ice the interrupt,, in addition, such an interrupt would re- 
quire a context switch to sent the coprocessor com- 
mand; to command memory and another context switch 
fo return from the Interrupt. Consequently, the receive 
data synchronization command frees considerable ca- 
pacity wilhin digital signal processor core for more pro- 
ductive use. 

[0034] Another non-computational command is a 
SO send data synchronization command, The send data 
synchronization command is nearly the inverse ot the 
receive data synchronization command. Upon reaching 
the send daia synchfwteaiiorf command, reconfsgura- 
ble hardware co-processor 140 kigge^s a direct memory 
access operation, This direct memory access operation 
reads data from data memory 145 or unified memory 
149 tor storage at another system location. Tills direct 
memory access operation may be preset by digital sig- 
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nai processor core 110 and is mereiy begun upon re- 
ceipt oi a signal from reconfigyrable hardware co-proc- 
essor 140 upon encountering the send data synchrony 
zatiort command, !rt the case in which direct memory 
access circuit 120 supports piural DMA channels, the 
send data synchronization command must specify the 
DMA channel triggered. Alternatively, the send data 
synchronization command may specify the control pa- 
rameters for direct memory access circuit 120, Inducting 
the DMA channel # more than one channel is supported, 
Upon encountering such a send data synchrony a! ion 
command feconJigurabte hamV/me co-processor com- 
municates dffeetiy with direct memory access circuit 120 
to set up and start an appropriate direct memory access 
operation, 

£Q035j Another possible non-computational com- 
mand is a synchronization completion command.. Upon 
encountering a synchronization: compaction command, 
reconfigurabie hardware coprocessor 140 sends an in- 
terrupt to digital signal processor core 110. Upon receiv- 
ing such an interrupt, digital signal processor core 110 
is assured that aii prior commands sent to reconfi cura- 
ble hardware co-processor 140 have completed. De- 
pending upon the application, it may be better to control 
via interrupts than through send and receive data syn- 
chronisation commands. It may also be belter to queue 
several opera! ions tor reconfigurabte hardware co-proc- 
essor 140 using send and receive data synchronisation 
commands and then interrupt digital signal processor 
core 1 1 0 at the and of the queue. This may be useful for 
higher level control functions by digital signal processor 
core 11 0 following the queued operations by reconfig- 
arable hardware coprocessor. 
[00383 Figure 5 iilusl rales another possible arrange- 
ment ot circuit 100, Circuit 100' iifustrated in Figure 5 In- 
cludes two reconrigurable hardware co-processors. 
Digital signal processor core 110 operates with first 
recoverable hardware coprocessor 140 and second 
reconfigurabie hardware co -processor 180. A private 
bus 185 couples first reconfigurable hardware co- proc- 
essor 140 and second reeonfsgurabie hardware co- 
processor ISO. These co-processors have private mem- 
ories sharing the memory space of digital signal proc- 
essor core 110 The data can be transferred via private 
bus 185 by one co-processor writing to rhe address 
range encompassed by the other coprocessor's private 
memory. Alternatively each co-processor may have an 
output: port directed toward an Inpu* porl of the other co- 
processor with the [inks between co-processors encom- 
passed in private bus 185, Thfe construction may be pan 
ticuiarly useful for products in which data flows from one 
type operation bandied by one co-processor to another 
type operation handled by the second co-processor 
This prlva-e bus frees digital signal processor core 110 
from having to handle the data handoff either directly or 
via direct memory access circuit 120, 
[0037] Figures 6 to 9 illustrate the construction of an 
exemplary ^configurable hardware co-processor. This 



particular coprocessor is called a multiple multiply^ 
cumulates The muftip^-aecomuJate operation where 
the sum of plural products is formed is widely used in 
signal processing. Many filter algorithms are buffi 

$ around these functions, 

[0038] Figure 8 frustrates the overall general archi- 
tecture of multiple mu£tip!y-aceumu&tor 1 40. Data mem- 
ory 145 and coefficient memory 147 may be written to 
in 128 bit words. This write operation is confronted by 

W digital signai processor core 110 or direct memory ac- 
cess circuit 120. Address generator 150 generates the 
addresses for recall: of data and coefficients used by the 
co-processor. This read operation operates on data 
words oM28 bits from each memory 

^ [0039] These recalled data words are supplied to in- 
put formatter 180, Input formatter 180 performs various 
shift and alignment operations generally to arrange the 
128 bit Input data words Into the order needed for the 
desired computation. Input formatter outputs a 128 bit 

20 (Sby 16 bi:s) Data X, a 1 2B bit (S by 1 6 bite) Data V and 
a 64 bit (2 by 32: frts) Data Z 
[0040] These three data streams are supplied to da- 
tapath 170. Datapath 170 \s the operational porta of 
the coprocessor As will he further described below da- 

& tapath MO includes plur&i hardware multipliers and 
adders that are eonnectadle in various ways to perform 
a variety of rnuiilply-aecumulate operations. Datapath 
170 outputs two adder data streams. Each of these is 4 
32 bit data words. 

30 10041} These two data streams supply the inputs to 
output formatter 180 Output formatter ISO rearranges 
the two data streams into two 1 28 bit data word for writ- 
ing back into trie two memories. The addresses tor these 
write opera! ions are computed by address generator 
1 50. This rearrangement may take care of alignment on 
memory word boundaries. 

[0042] The operations ot co processor 1 40 arc under 
control of control unit 1 90, Control unii 1 90 recalled me 
commands from command memory 141 and provides 

40 the corresponding control within co-processor 140. 
[0043] The construction of input formatter 1 60 is illus- 
trated In Figure 7. Each of the two data streams of 1 28 
bits are supplied to an Input of multiplexers 205 and 207, 
Each multiplexer Independently selects one input for 

45 storage in Its corresponding register 215 and 217 Mul- 
tiplexer 205 may select to recycle the contents of regis- 
ter 21 5 as well and eJther data stream. Multiplexer 207 
may only seleci one of the input data streams. Muttiplex- 
ers 201 and 203 may select the contents of register 21 5 

SO or may select recycling of the consents of their respec- 
tive registers 211 and 213, Multiplexer 129 selects the 
contents of either register gn or 213 for suppfy to the 
upper bits of shifter 221 , If ie tower bits are supplied from 
register 215. Shifter 221 shifts and selects only 128 bits 

S£ of Its 256 input bits. These 1:28 bits are supplied to du~ 
plicate/swap unit 223. Duplicate/swap unit 223 may du- 
plication a portion of its input into the tuff 128 bits or it 
may rearrange the data order. Thus sorted, the data is 
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temporarily stored in register 225, This forms the Data 
X input to datapath 170 The output of multiplexer 207 
is supplied directly to multiplexer 233 and well as sup- 
plied via register 217. Multiplexer 233 selects 192 bits 
from the bits supplied to it. The upper 12S bits form the 
Data Y input to da-apath 170. These bits may b& recir- 
culated via rnuitip^exar 235, The tower 64 bits forms the 
Data Z input to datapath 170, 

[0044] Figure £ illustrates In block diagram form ihe 
construction of datapath 1 70. Various segments of the 
Data X and tfie Data Y inputs supplied tram input for- 
matter are suppled to dual multiply adders 310, 320, 
330 and 040. As shown,; the first and second 16 bit data 
words Data X(0 1 j and Date Yj0:1 j are coupled to dual 
multiply adder 31 0, the third and fourth 16 bit data words 
Data X|2 3j and DataY[2:3) are coupled to dual multiply 
adder 320, Ihe fifth and sixth 18 bit data words DaLa X 
[4:5] and Data Y[4:5] are coupled to dual rnuitiply adder 
330 and the seventh and eighth 18 bit data words Data 
X: 8: 7] and Data Y(8:7] are coupled to dual rrtuhtpiy 
adder 340. Each of these units is identic?*:, only dual 
multiply adder 3iQ wiii be described in detail. The least 
significant 16 Data X and Date Y bits supply inputs to 
multiplier 311 . Itf-uflipiier 311 receives the pair of 16 bit 
inputs and produces a 32 bit product This product is 
stored in a pair of pipeline output registers. The 32 bit 
output :S supplied Jo Doth sign extend unit 31 3 and an 8 
bit toll shifter 31 4 Sign extend unit 31 3 repeats the sign 
bit ol She product, which is the most significant bit, to 40 
bits. The 8 bit left shifter 314 iott shifts the 32 brt product 
and zero fills the vacated least Significant bite One of 
these two ^0 bit quantities is selected in multiplexer 316 
for application to a first input ol 40 bii adder 319, In a 
sim&ir fashion, the next most significant 18 Data X and 
Data Y bits are supplied to respective inputs of mu itipiter 
312. Multiplier 312 receives the two 16 felt inputs and 
produces a 32 bit product. The product is stored in a pair 
of pipeiina registers. The 6 bit right shiUer 315 right shifts 
^he product by 8 hits and zero fills the vacated most sig- 
nificant bits. Multiplexer 317 selects from among three 
quantities. The first quantity is a concatenation of the 1 6 
Date X bits and Ihe 1 6 Data Y bits at the input. This input 
allows multiplier 31 2 to be bypassed, if selected the 32 
bits (as sign extended by sign extender 318) are added 
to the product produced by multiplier 311 . The second 
quantity is the product supplied by multiplier 312, The 
third quantity Is the shifted output of 8 bit right shifter 
315. The selected quantity from multiplexer 317 is sign 
extended to 40 bits by sign extend unit 318, The sign 
extended 40 bit quantity is the second input to 40 bit 
adder 319, Adder 319 is provided with 40 bits even 
though the 16 bit input factors would produce only 32 
bits to provide dynamic range tor plural multiply accu- 
mulates. 

[0045] The output of the adders 31 9 within each of the 
dual multiplier adder units 310, 320, 330 and 340 are 
provided as the first adder stage output adcter_st 1 jxitp. 
Only the 32 most significant adder output bits is con- 



nected to the output, This provides a 4 by 32 bit our 1 28 
bit output.. 

[0046] A second stage ol 40 bit adders includes 
adders 353 and 355, Adder 353 adds the outputs of dua^ 

5 multiply adder units 310 and 320. Adder 355 adds the 
outputs af dual multiplier adder unite 330 end 340. Two 
other data paths join within the second adder stage. The 
least significant 32 bits of the Data Z input is temporarily 
stored in pipeline register 351 . This 32 bftquan&y is sign 

w extended to 40 bits in sign extend unit 352, In a similar 
fashion, ihe most significant bits of ihe Data 2 input Is 
temporarily stored in pipeline register 357. This quantity 
is sign extended to 40 bits by sign extend unit 363. 
[0047] The fhird adder stage includes adders 381 ; 

^ 363, 367 and 368, Adder 361 is 40 bM wide, ft adds. the 
output of .adder 363 and ihe sign extended least signif- 
icant Data Z bits. Hie 32 most significant hits of this sum 
are supplied as part of the third stage output 
adder __st3._ outp. Similarly, adder 363 is 40 bits wide and 

2£ adds the output of adder 355 and the sign ex- ended 
most significant Data Z bits. The 32 most significant bits 
of this sum ar e supplied as pari of the third stage output 
adder_3l3_outp. The connections to adders 367 and 
368 are much more complicated. The first input to adder 
367 1 s either the oufcp ut of adder 353 of the second stage 
or a recirculated output as selected by multiplexor 364, 
Multiplexer 371 selects I rem among 8 pipeline registers 
for the recirculation quantify. The second input to adder 
367 is selected by multiplexer 365 This is either me 

6 least significant Data Z input as sign extended by sign 
extend mil 353 the cmci output of adder 368. the out- 
put of adtim 355 or a fixed rounding quantity rndjadd 
Addition of the fixed rounding quantity md„add causes 
the adder to round the quantify al the other input. The 
output of adder 367 supplies the input to variable right 
shifter 375.. Variable right shifter 375 right shifts the sum 
a selected amount of 0 to 1 & bits. The 32 most significant 
hits of its output forms a pari of the third stage output 
adder j$t3_outp. The first input to adder 338 is the out* 

40 put of adder 355- The second input to adder 368 is se- 
lected oy multiplexer 366. Multiplexer 366 selects either 
the output of adder 3S3 ; the most siynivicanl Data Z input 
as sign extended by sign extend unit 358 ; the recircula- 
tion input or the fixed rounding quantity rnd„add. Mufti- 

45 p lexer 373 selects the recirculation quantity from among 
8 pipeline registers at the output of adder 368, The out- 
put of adder 368 supplies the input to variable right shift- 
er 377. Variable right shifter 377 right shifts the sum a 
selected amount of 0 to 1 5 bits. The 32 most significant 

SO bits of its output forms another part of the third stage 
output adder_st3_outp, 

[0048] Figure 9 illustrates the construction of the out- 
put formatter illustrated in Figure 8. 
[0049] Figures 10 to 13 illustrate several ways that 
S£ multiple multiply accumulate co-processor 160 may be 
configure. The data flow in each of these examples can 
he achieved by proper selection of the multiplexers with- 
in datapath 170. The following description wit: note the 
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corresponding multiplexer selections whan they are rel- 
evant to achieving the desired data flow, 
[0060] F igure 1 0 illustrates the data fiow in a real Units 
imputes Mm' (FiR). Data DO to D7 and coefftc&nts CO 
to C? are suppled to respective multipliers 311. 312, 
1 822, 331. 3332, 341 and 342. in this case mufti* 
plexerscGrrespcs^ngtGroultiplsxer 31 7 tn dual mu^tipiy 
adder unit 310 each select Ihe product of the respective 
multipliers 31 2, 322. 332 and 342. Pairs of products are 
summed in adders 31 9 ; 329 f 339 and 349 . Pairs of these 
sums are further summed in adders 353 and 355. The 
sums formed by adders 353 and 355 are adder in adder 
368, In this case., multiplexer 365 sheets the sum pro- 
duced by adder 353 tot the second input lo adder 366. 
Adder 367 does the accumulate operation.. Multiplexer 

364 selects trie output of multiplexer 371, selecting a 
pipeline register tor recirculation, as the firsl input lo 
adder 363, Multiplexer 366 selects the output of adder 
368 and the second input to 363. Adder 387 pro^ 
duces the filter output. Note that this data flow produces 
the sum of the 8 products formed wfth the prior summed 
products. This operation is general known as multiply 
accumulate and is widely used In filter Junctions. Con- 
figuration of datapath 170' as iKustrated in Figure 7 per- 
mils computation of the accumulated sum of 8 products. 
This greatly increased the throughput in this data flow 
over the typical single product accumulation provided 
by digital signal processor core 110. 

[0QS1] Figure n illustrates the data Sow ot a complex 
Ft R filter This data flow Is similar to that of the real FIR 
filter illustrated in Figure 7. The data flow of Figure 8 
simultaneously operates on the real and Imaginary pads 
ot the computation Data arid coefficients are supplied 
to respective multipliers 311 , 31 2, 321. 322. 331, 3332, 
341 and 342. Multiplexers corresponding to multiplexer 
317 in dual m ultiply adder unit 31 0 each select the prod- 
uct of the respective multipliers 31 2, 322, 332 and 342. 
Pairs of products are summed in adders 319. 329, 339 
and 349. Pairs of these sums are further summed in 
adders 353 and 355, The real and complex parts are 
separately handled by adders 357 and 368. Multiplexer 

365 selects the sum of adder 353 for the second input 
to adder 967. Multiplexer 364 selects the output of mul- 
tiplexer 371, selecting a pipeline register for e scire Ufa- 
u'on.. as the first input to adder 383. Adder 368 receives 
the sum O'f adder 365 as Its first input. Multiplex©! 368 
selects the recirculation output of multiplexer 373 for the 
second Input to adder 383. The pair of adders 367 and 
363 thus produce the real and imaginary pans of she 
multiply accumulate operation 

f00$£j Figure 12 illustrates the data flow in a coeffi- 
cient update operation. The error terms E0 to E3 are 
multiplied by the corresponding weighting tetms W0 to 
W3 in multipliers 311, 321, 331 and 341. The current 
coefficients to bs updated CO to G3 are input directly to 
adders 31 9. 329, 339 and 349 as selected by multiplex- 
ers 31 7, 327, 337 and 347. The respective products are 
added to the cur rent val ues in adders 31 9, 329, 339 and 



349. in this case the output is produced by adders 31 9 f 
329, 339 and 349 via the adder stage 1 output 
arider__$ft,_ouip 

[0053] Figure 1 3 Illustrates the data flow In a fast Fou~ 

5 tier transform (FFT) operation. The FFT operation starts 
with a IB bit by 32 b\i mu&ipfy operation. This is achieved 
as follows. Each dual mufstply adcier 310, 320. 330 and 
340 receives a respective 1 6 bit quant sty AO to A3 &1 one 
Input of each d the paired multipliers 311 and 312, 321 

w and 322. 331 and 332, and 341 and 341 . Multipliers 311. 
32 T 331 and 341 receive the 18 most significant bite of 
the 32 m quantity 3GH to B3H. Muittpiters 312.. 322.. 332 
and 342 receive the 1 6 least significant bits of the 32 bit 
quantity BGL to 83L Shifters 314. 315, 324, 325, 3:34. 

« 335, 344 and 345 are used to align the products. Mufti* 
plexors 316, 326, 336 and 346 select the Ml shitted 
quantity from respective 8 bIS left shifters 31 4 ; 334, 334 
and 344 tor the first Input into respective adders 31 9 : 
329, 339 and 349. Multiplexers 317, 327 : 337 and 347 

£S select she right shifted quantify from respective 8 bit right 
shifters 31 5, 325, 335 and 345 as the second inputs to 
respective adders 319, 329, 339 and 349. Those two 
oppositely directed 8 bits shifts provide an effective 18 
brt shift lor aligning the partial products for a 18 bit: by 
32 brl mutt [ply. Pairs of these sums are further summed 
in adders- 353 and 355, Adder 361 adds the Data ZO 
input with the output f rom adder 353. Multiplexer 364 
selects the sum oi adder 353 as the first input \o adder 
367, Multiplexer 365 selects that Data Z0 Input as me 

30 second input to adder 387, Adder 368 receives the sum 
of adder 355 as lis first input. Multiplexer 366 selects 
the Data Zt input as the second input to addm 368 
Adder 383 adds the sun of adder 355 and the Data Zl 
input. The output of the FrT operation is provided by the 

?<$ sum outputs of adders 361 . 367, 368 and .363.. 

£00543 ^he tist below is a partial list of some ol the 
commands that may be performed by the data path 1 70 
of muftipie multiply accumulate unit 140 illustrated in 
Figures 3 to 3 

4Q 

vector .add. J 6b(len ; pdsta, pcoefi pout) 
vector ..add...32b(len, pdaia pcoeff. pout} 
veetor„mpyJ6b(te» ? pcfeta : pcoeff. pout) 
vedor..mpy.JS32fe(le>% pdsia peoefl, pout} 

^5 vectot m mpy„32b([sr>j pd&ta, pcoeff. pout:} 

scalar^vetfor^adcLISbfien, pdata, pcoeff. pout) 
scalar. ..vector, add .32b(ter>, pdata. pcoeff. pout} 
scalar.... vector . mpy...i6b{len, pete pcoeff, pout) 
scalar. vector .mpy...t632b{ fen . pdata, pcoeff, pout) 

SO scaiar_vector_mpy_32b{bn ; pdaia ; pcoeff. poui) 

For these operations: the operation name indicates the 
data ske. The *len* parameter Held indicates ihe length 
of the function. The lf pdafe <! parameter field is a pointer 
to the beginning memory address containing the input 
data. The K peoefP parameter field is a poia-er to the be- 
ginning memory address containing the coefficients for 
the filter. The "pout* parameter field is a pointer to the 



10 



EP 0 045 788 A3 



20 



beginning memory address to receive the output. As 
previously described, these pointer preferably point to 
respective locations within data memory 145 and coef- 
ficient memory 147 or unified memory 149. 

FFT_reat{fft_sij?s f pdata. pcoeff, poui) 
FFT_comp]sx(ftt w .size r pdata, pcoe!t : pout) 

The fast Fourier transform operations preferably all in- 
clude 82 bit data and 18 bit coefficients as previously 
described in conjunction with Figure 10, The fft.„si2e pa- 
rameter field defines ihe stee of the function. The other 
Etsted parameter fields are m described above. 

RR w real(its. ds, tea blocked pdata, pcoeff, pout) 
RR„.comptex.reai(us ? ds, ten. bbeksi/e, pdata, 
pcoeft pOEJl) 

F I R_compfex_r e&Lsumi us . ds s ten, bloeksizs, pda- 
ta, pcoeff, pout) 

F!R_comptex{us, ds. ten. blocfcsi>:e ; pdata. pcoeff. 
pout) 

The finite impulse response filter operations differ in the 
type of the dat&and coefficients. The RRj-eat operation 
empioys rml data and real coefficients. The 
FIRj;cmpfex_real operation employs complex data 
and real coefficients. The FfR.. complex.. reaLs urn op- 
oration separately sums the complex and real parts erm 
ploying complex data and real coefficients. The 
FIR...ccmptex operation employs both complex data and 
complex coefficients. The us parameter field indicates 
the ups&mpting ratio. The ds parameter field Indicates 
the down sampling ratio. The blocksize parameter field 
indicates that size of the operational blocks emp toyed. 
The other parameter fietds are as previously described, 
The parameters of all these commands could 
be either immediate values or, for the dala. coefficient 
and output locations. 1 6 bit address pointers Into the co- 
processor memory. This selection won id mean that the 
finite impulse titter commands, which are the longest, 
would require about five 16 bit command words, This 
would be an insignificant amount of bus traffic Alterna- 
tively the parameter fields could: be indirect; that is iden- 
tify a register from a limited set of registers for each pa^ 
rameter. Them couSd be a set of S registers for each pa- 
rameter, requiring: only 3 bits each within the command 
word. Since on;y a limited number of particular f ilter set- 
tings would be required:, this is feasible. 



Claims 

1, A data processing system disposed on a single in- 
tegrated cirouE s comprising: 

a digital signal processor core connected to a 
data bus and art address bus, said digital signal 
processing core operable for generating co- 



processor commands; 

a co-processor connected to said data hus : 
said address bus and said digital signal 
processing core, said co processor having a so- 

5 cat memory within She address space of said 

digital signal processor core and responsive to 
commands generated by said digital sigriat 
processor core to periorm predetermined data 
processing operations on data stored in said 3o- 

w cai memory in parallel to said digital signal 

processor com. 

2, The data processing system of claim i < f urther com- 
prising: 

^ a direct memory access circuit under the corv 

troi of said digital signal processor and capable of 
autonomously transferring data between prede- 
fined: addresses in memory including transferring 
data to and from said iceal memory of said co-proc- 

£0 essor 

3, The data processing system o! claim 2 : wherein: 

said co -processor is responsive for receiving 
a data synchronism command for pausing process- 
es ing commands until said timet memory access cir- 
cuit signals completion of a predetermined memory 
transfer of data into said local memory. 

4, The data processing system ot claim 2. wherein: 
30 said co processor is responsive to a send da- 
ta synchronism: command tor signalling said direct 
memory .access chewt to trigger a predetermined 
memory transfer of data out ot said local memory 

#5 $., The data processing system ot any preceding 
claim, wherein: 

said co processor further includes a com- 
mand first-in-first out memory having a input re- 
sponsive to data written to a predetermined mem- 

4G ory address and an output tor controlling operation 
o! said co-processor.. 

6. The data processing system of any preceding 
dairo, wherein said co processor is responsive to 
said commands for configuring iteelf correspond- 
ingly whereby said co- processor is operable to per- 
form a set of related date processing operations, 

7, The data processing system ot any preceding 
SO claim, wherein said co-processor is responsive to 

an interrupt command for tranemlting an interrupt 
signal to said digital signat processor core. 

8* The data processing system of any preceding 
claim, wherein each command includes an indica- 
tion of a data input locate within said local mem- 
ory; and 

said co-processor ss responsive to said com- 
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mands fc recaii data from said iocal memory start- 
ing with said indicated data inpiJl bcation 

9. The data processing system of any preceding 
c\am\ wherein each command includes an incBca- $ 
lion of a eta output location within said local mem* 
cry; and 

said co-processor is responsive to said com- 
mands for storing resultant data from a data 
processing operation corresponding to said com- 
mand In local memory starting with said indicated 
daia input location. 

10. A method of data processing comprising the steps 

of: « 

providing a focal memory within a co-processor 
having addresses within a memory map of a 
digital signal processor core; 
transferring data lo said focal memory; £Q 
transmitting a command to said co-processor 
thereby causing said oo processor to perform 
a corresponding daia processing opera! son in 
parallel to said digital signal processor cor© and 
store resuits in s&sd local memory; and ^ 
transferring said results out of said iocal mem- 
ory Of sa:d CO-pEOO0SSOr. 

11. The method of claim 1 0 wherein: 

said step ot transferring data to said local 30 
memory comprises storing data in a next location in 
a circularly organised memory area serving as an 
Input buffer 

12. The method of claim 1 0 or claim 11 wherein; 

said step of storing results in said local mem- 
ory comprises storing data in a next location in a 
circuiariy organised memory area serving m m out- 
put buffer. 

13. The method of claim 1 2, further comprising: 

storing input daLa within a drcu lady organized 
history buffer having a size corresponding to a time 
extent of said corresponding data processing oper- 
aibn substantially concurrently with said step of ^ 
storing results in said local memoty. 
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